Final Project

Instructions

  • The project is to be carried out by groups of 3-4 students.
  • It must be presented in person on December 15th.
  • This project accounts for 40% of your final grade.
  • The presentation should last no longer than 20 minutes.

Development

Choose a relevant dataset that includes at least three categorical columns and three numerical columns. Select a target variable for your predictions.

You must clean the data, analyze it, and select the appropriate variables for your model.

Test the Logistic, KNN, Decision Tree, and Random Forest models. Create a comparative table of the results obtained, considering all metrics, including the AUC ROC.

Perform K-means clustering on your data and explain the characteristics of each cluster.

Grading Rubric

The 100 points for this project will be divided as follows, with each category evaluated at three levels: Achieved, Partially Achieved, and Not Achieved.

  1. Data Selection and Cleaning (20 points)

    • Achieved (20 points): Relevant dataset with appropriate cleaning and preprocessing.
    • Partially Achieved (10-15 points): Dataset is relevant but requires more thorough cleaning or preprocessing.
    • Not Achieved (0-9 points): Dataset is not relevant or poorly cleaned/preprocessed.
  2. Data Analysis and Variable Selection (20 points)

    • Achieved (20 points): Comprehensive analysis and appropriate variable selection for the model.
    • Partially Achieved (10-15 points): Basic analysis conducted; variable selection could be improved.
    • Not Achieved (0-9 points): Inadequate analysis and poor variable selection.
  3. Model Implementation and Testing (30 points)

    • Achieved (30 points): Accurate implementation and testing of all required models with a well-constructed comparative table.
    • Partially Achieved (15-25 points): Implementation and testing of most models; comparative table lacks detail.
    • Not Achieved (0-14 points): Incomplete model implementation or testing; no comparative table.
  4. K-means Clustering Analysis (20 points)

    • Achieved (20 points): Effective use of K-means clustering with clear explanations of each cluster.
    • Partially Achieved (10-15 points): Basic use of K-means clustering; explanations of clusters are vague.
    • Not Achieved (0-9 points): Poor or no use of K-means clustering; no explanations of clusters.
  5. Oral Presentation (10 points)

    • Achieved (10 points): Clear, concise, and well-organized presentation within the time limit.
    • Partially Achieved (5-9 points): Presentation is understandable but could be more organized or better timed.
    • Not Achieved (0-4 points): Unclear or disorganized presentation; significantly exceeds or falls short of the time limit.